A Hierarchical Semi-Markov Model for Detecting Enrichment with Application to ChIP-Seq Experiments
نویسندگان
چکیده
Chromatin immunoprecipitation followed by direct sequencing (ChIP-Seq) has revolutionalized the experiments in profiling DNA-protein interactions and chromatin remodeling patterns. However, limited statistical tools are available for modeling and analyzing the ChIP-Seq data thoroughly. We carefully study the data generating mechanism of ChIP-Seq data and propose a new model-based approach for detecting enriched regions. Our model is based on a hierarchical mixture model which gives rise to a zero-inflated negative binomial (ZINB), coupled with a hidden semi-Markov model (HSMM) to address the sequencing depth and biases, the inherent spatial data structure and allows for detection of multiple non-overlapping variable size peaks. In particular, we demonstrate that the proposed ZINB accounts for the excess zeroes and over-dispersion in the observed data relative to a Poisson distribution, and this model provides a better fit as the background distribution. We also propose a new meta false discovery rate (FDR) control at peak level as an alternative to the usual heuristic postprocessing of enriched bins identified via bin level FDR control. We show with simulations and case studies that this new procedure allows for the boundaries of peak regions to be declared probabilistically and provides accurate FDR control.
منابع مشابه
Hierarchical hidden Markov model with application to joint analysis of ChIP-chip and ChIP-seq data
MOTIVATION Chromatin immunoprecipitation (ChIP) experiments followed by array hybridization, or ChIP-chip, is a powerful approach for identifying transcription factor binding sites (TFBS) and has been widely used. Recently, massively parallel sequencing coupled with ChIP experiments (ChIP-seq) has been increasingly used as an alternative to ChIP-chip, offering cost-effective genome-wide coverag...
متن کاملMOSAiCS-HMM: A model-based approach for detecting regions of histone modifications from ChIP-seq data
Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) experiments are routinely utilized for studying epigenomics of transcriptional regulation. We review some of the important statistical issues in the analysis of these experiments and extend our previous model for the analysis of ChIP-seq data of transcription factors, named MOSAiCS, with a hidden Markov model archit...
متن کاملExploring the Link Between Gene Expression and Protein Binding by Integrating mRNA Microarray and ChIP-Seq Data
ChIP-sequencing (ChIP-seq) experiments are now routinely used to study genome-wide chromatin marks in epigenetic research. However, due to the high cost and complexity associated with this technology, it is of great interest to investigate whether the results produced by the low-cost option of mRNA microarray experiments can be used in place of ChIP-seq data and what advantages can be achieved ...
متن کاملIn Silico Pooling of ChIP-seq Control Experiments
As next generation sequencing technologies are becoming more economical, large-scale ChIP-seq studies are enabling the investigation of the roles of transcription factor binding and epigenome on phenotypic variation. Studying such variation requires individual level ChIP-seq experiments. Standard designs for ChIP-seq experiments employ a paired control per ChIP-seq sample. Genomic coverage for ...
متن کاملA statistical framework for power calculations in ChIP-seq experiments
MOTIVATION ChIP-seq technology enables investigators to study genome-wide binding of transcription factors and mapping of epigenomic marks. Although the availability of basic analysis tools for ChIP-seq data is rapidly increasing, there has not been much progress on the related design issues. A challenging question for designing a ChIP-seq experiment is how deeply should the ChIP and the contro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009